The following DESeq analysis uses lessons from https://github.com/hbctraining/DGE_workshop. Here we will be examining the differential expression of CALU-3 cell genes comparing anaerobic environment to aerobic environment.
### Bioconductor Github and CRAN libraries used
library(tidyverse)
library(RColorBrewer)
library(DESeq2)
library(pheatmap)
library(DEGreport)
library(ggpubr)
library(ggrepel)
library(wesanderson)
library(cowplot)
theme_set(theme_bw())
subread.txt was produces by MSI using the following conditions (copied from the rna-seq report.html provided by UMGC/MSI:
“The RNA-Seq dataset was analyzed using the reference Human (Homo_sapiens) genome assembly “GRCh38” using annotation from Ensembl release 98. The Ensembl GTF annotation file was filtered to remove annotations for non-protein-coding features. Fastq files were evenly subsampled down to a maximum of 100,000 reads per sample. Quality of data in fastq files was assessed with FastQC. A Hisat2 splicesite file was generated from the GTF file. Hisat2 was used to align reads to a reference genome using the options ‘–dta-cufflinks –rna-strandness RF –known-splicesite-infile /panfs/roc/umgc/tmp/scratch/200226_A00223_0330_BH2WT2DSXY/demultiplex_20200302-13-02-18/Hunter_Project_027/rnaseq/splicesites.txt -x /panfs/roc/groups/3/umii/public/ensembl/Homo_sapiens/GRCh38/hisat2//genome’. Ribosomal content was determined by aligning the first 10,000 R1 reads to the Silva ribosomal database and reporting the percent of reads with alignment. Gene-level raw read counts were generated using featureCounts from the Subread R package using the options ‘-s 2 -B -p -Q 10’. Insert sizes were summarized with Picard."
The metadata file was made to include two columns of interest.
1) ATMOSPHERE with two levels: Anaerobic and Aerobic, and
2) TEST which contains the major experimental groups of this analysis.
## Load in data
data <- read.table("../data/subread.txt", header=T, row.names=1)
meta <- read.table("../data/metadata.txt", header=T, row.names=1)
# Check that the row names of the metadata equal the column names of the **raw counts** data
all(colnames(data) == rownames(meta))
[1] TRUE
We are using a simple design ~ TEST here.
# Create DESeq2Dataset object
dds <- DESeqDataSetFromMatrix(countData = data, colData = meta, design = ~ TEST)
some variables in design formula are characters, converting to factors
The reference sample group is automatically set by alphabetical order. Set the reference to be aerobic.
dds$TEST <- relevel(dds$TEST, ref = "aerobic")
Create a heatmap of the count data to visually explore the relationships between the samples.
# Transform counts for data visualization
rld <- rlog(dds, blind=TRUE)
# Extract the rlog matrix from the object
rld_mat <- assay(rld)
# Compute pairwise correlation values
rld_cor <- cor(rld_mat)
# Plot heatmap
pheatmap(rld_cor)
# Input is a matrix of log transformed values
pca <- prcomp(t(rld_mat))
# Create data frame with metadata and PC3 and PC4 values for input to ggplot
df <- cbind(meta, pca$x)
# PC1 and PC2
PC1PC2 <- ggplot(df) + geom_point(aes(x=PC1, y=PC2, color = TEST), size = 3) +
scale_color_manual(limits = c("aerobic", "anaerobic", "anaerobic_CRS"),
values = wes_palette(n=3, name="Darjeeling1"))
# PC2 and PC3
PC2PC3 <- ggplot(df) + geom_point(aes(x=PC2, y=PC3, color = TEST), size = 3) +
scale_color_manual(limits = c("aerobic", "anaerobic", "anaerobic_CRS"),
values = wes_palette(n=3, name="Darjeeling1"))
# Get legend
PC.Legend <- get_legend(PC2PC3)
# Plot together
plot_grid(PC1PC2 + theme(legend.position = "none"),
PC2PC3 + theme(legend.position = "none"),
PC.Legend,
nrow = 1,
rel_widths = c(1,1,1)
)
It appears the biggest source of variance on the first axis is whether cells were incubated aerobically or anaerobically. I want to look at the third PCA axis to see if variation between anaerobic and anaerobic_CRS samples is captured there.
In the second and third principal component, we can observe the separation of the anaerobic and anaerobic_crs samples. Imagine PC1 is going into the page and that aerobic samples are still differentiated from these two groups.
# Run DESeq2 differentiol expression analysis
dds <- DESeq(dds)
estimating size factors
estimating dispersions
gene-wise dispersion estimates
mean-dispersion relationship
final dispersion estimates
fitting model and testing
# **Optional step** - Output normalized counts to save as a file to access outside RStudio
normalized_counts <- counts(dds, normalized=TRUE)
The next step in the workflow is to fit a curve to the dispersion estimates for each gene. The idea behind fitting a curve to the data is that different genes will have different scales of biological variability, but, over all genes, there will be a distribution of reasonable estimates of dispersion.
# Plot dispersion estimates
plotDispEsts(dds)
Based on this shrinkage curve it looks like our data is a good fit for the model.
Let’s look at the different comparisons that are available to us based on our model:
resultsNames(dds)
[1] "Intercept" "TEST_anaerobic_vs_aerobic" "TEST_anaerobic_CRS_vs_aerobic"
Output the results for two comparisons: aerobic vs anaerobic and anaerobic_CRS vs anaerobic.
Add gene annotation to the results dataframes ## Results ### Annotation Source
Loading required package: BiocFileCache
Loading required package: dbplyr
Attaching package: ‘dbplyr’
The following objects are masked from ‘package:dplyr’:
ident, sql
Attaching package: ‘AnnotationHub’
The following object is masked from ‘package:Biobase’:
cache
## Explore the grch38 table loaded by the annotables library
grch38_df <- data.frame(grch38)
Use the results function to test DEGs for the anaerobic vs. aerobic comparison.
# Output results of Wald test for contrast
contrastATM <- c("TEST", "anaerobic", "aerobic")
resATM <- results(dds, contrast = contrastATM)
resATM <- lfcShrink(dds, contrast = contrastATM, res=resATM, type = "ashr")
using 'ashr' for LFC shrinkage. If used in published research, please cite:
Stephens, M. (2016) False discovery rates: a new deal. Biostatistics, 18:2.
https://doi.org/10.1093/biostatistics/kxw041
summary(resATM)
out of 16239 with nonzero total read count
adjusted p-value < 0.1
LFC > 0 (up) : 583, 3.6%
LFC < 0 (down) : 386, 2.4%
outliers [1] : 1, 0.0062%
low counts [2] : 9973, 61%
(mean count < 2)
[1] see 'cooksCutoff' argument of ?results
[2] see 'independentFiltering' argument of ?results
Convert results objects into dataframes for plotting and export
# Turn the results object into a data frame
resATM_df <- data.frame(resATM) %>%
rownames_to_column("ensgene") %>%
left_join(grch38_df, by = "ensgene") # Join annotation data to dataframe
# Save as csv
write_csv(resATM_df, "../results/DEresults_anaerobicVSaerobic.csv")
# Set filtering parametes for alpha and lfc
padj.cutoff <- 0.05
lfc.cutoff <- 1
# Subset the significant results
sig_resATM_p05 <- dplyr::filter(resATM_df, padj < 0.05 & abs(log2FoldChange) > lfc.cutoff)
sig_resATM_p01 <- dplyr::filter(resATM_df, padj < 0.01 & abs(log2FoldChange) > lfc.cutoff)
#sig_resCRS <- left_join(x = sig_resCRS, y = grch38_df, by = "ensgene")
write_csv(sig_resATM_p05, "../results/DEresults_sig_anaVSaer_p05.csv")
write_csv(sig_resATM_p01, "../results/DEresults_sig_anaVSaer_p01.csv")
Plot log2FC on the y axis and log2 mean normalized counts on the x-axis.
Color is based on the adjusted p-value
makeMA_05 <- function(x){
p <- ggmaplot(x,
#fc of 2 corresponds to the log2fc of 1 we tested in our hyp. in the results command
fdr = 0.05,
fc = 2,
size = 1,
palette = alpha(c("limegreen", "firebrick2", "gray50"),0.35),
genenames = as.vector(x$symbol),
legend = "top",
top = 20,
select.top.method = "fc",
font.label = c(10, "bold", "black"),
font.legend = "bold",
font.main = "bold",
ggtheme = ggplot2::theme_minimal())
p
}
makeMA_01 <- function(x){
p <- ggmaplot(x,
#fc of 2 corresponds to the log2fc of 1 we tested in our hyp. in the results command
fdr = 0.01,
fc = 2,
size = 1,
palette = alpha(c("limegreen", "firebrick2", "gray50"),0.35),
genenames = as.vector(x$symbol),
legend = "top",
top = 20,
select.top.method = "fc",
font.label = c(10, "bold", "black"),
font.legend = "bold",
font.main = "bold",
ggtheme = ggplot2::theme_minimal())
p
}
makeMA_001 <- function(x){
p <- ggmaplot(x,
#fc of 2 corresponds to the log2fc of 1 we tested in our hyp. in the results command
fdr = 0.001,
fc = 2,
size = 1,
palette = alpha(c("limegreen", "firebrick2", "gray50"),0.35),
genenames = as.vector(x$symbol),
legend = "top",
top = 20,
select.top.method = "fc",
font.label = c(10, "bold", "black"),
font.legend = "bold",
font.main = "bold",
ggtheme = ggplot2::theme_minimal())
p
}
maPlotATM_05 <- makeMA_05(resATM_df) + ggtitle("Anaerobic vs Aerobic (ref)") +theme(
legend.text = element_text(size=rel(0.9)),
legend.title = element_blank(),
legend.position = "top") +
guides(colour = guide_legend(override.aes = list(alpha=0.5, size=3))) +
scale_colour_manual(values = alpha(c("limegreen", "firebrick2", "gray60"),0.5), labels = c("Upregulated (262)", "Downregulated (104)", "NS"))
Scale for 'colour' is already present. Adding another scale for 'colour', which will replace the existing scale.
maPlotATM_05
ggsave(plot = maPlotATM_05, filename = "../figures/anaVSaer/maPlotATM_aerREF_05.pdf", device = "pdf", height = 5, width = 6)
maPlotATM_01 <- makeMA_01(resATM_df) + ggtitle("Anaerobic vs Aerobic (ref)") +theme(
legend.text = element_text(size=rel(0.9)),
legend.title = element_blank(),
legend.position = "top") +
guides(colour = guide_legend(override.aes = list(alpha=0.5, size=3))) +
scale_colour_manual(values = alpha(c("limegreen", "firebrick2", "gray60"),0.5), labels = c("Upregulated (204)", "Downregulated (72)", "NS"))
Scale for 'colour' is already present. Adding another scale for 'colour', which will replace the existing scale.
maPlotATM_01
ggsave(plot = maPlotATM_01, filename = "../figures/anaVSaer/maPlotATM_aerREF_01.pdf", device = "pdf", height = 4, width = 6)
maPlotATM_001 <- makeMA_001(resATM_df) + ggtitle("Anaerobic vs Aerobic (ref)") +theme(
legend.text = element_text(size=rel(0.9)),
legend.title = element_blank(),
legend.position = "top") +
guides(colour = guide_legend(override.aes = list(alpha=0.5, size=3))) +
scale_colour_manual(values = alpha(c("limegreen", "firebrick2", "gray60"),0.5), labels = c("Upregulated (117)", "Downregulated (31)", "NS"))
Scale for 'colour' is already present. Adding another scale for 'colour', which will replace the existing scale.
maPlotATM_001
ggsave(plot = maPlotATM_001, filename = "../figures/anaVSaer/maPlotATM_aerREF_001.pdf", device = "pdf", height = 4, width = 6)
#Gene Count Plots Let’s create tibble objects from the meta and normalized_counts data frames before we start plotting. This will enable us to use the tidyverse functionality more easily.
# Create tibbles including row names
DE_meta <- meta %>%
rownames_to_column(var="SAMPLE_NAME") %>%
as_tibble()
normalized_counts <- normalized_counts %>%
data.frame() %>%
rownames_to_column(var="gene") %>%
as_tibble()
Remove CRS samples from normalized counts and Meta data
norm_counts_atm <- select(normalized_counts, -starts_with("Sample.Anaerobic.CRS"))
DE_meta_atm <- DE_meta %>% filter(TEST == "anaerobic" | TEST == "aerobic")
Next I want to merge my resATM_df with normalized so that the symbol is also listed with normalized counts
normalized_resATM <- inner_join(norm_counts_atm, resATM_df, by = c("gene" = "ensgene"))
view(normalized_resATM)
The inner_join() will merge 2 data frames with respect to the “ensgene” and “gene” column, i.e. a column with the same column name in both data frames.
Often it is helpful to check the expression of multiple genes of interest at the same time. This often first requires some data wrangling.
We are going to plot the normalized count values for gene groups of interest
#making ploting function (box plot)
GroupGeneCountsBox <- function(x){
ggplot(x, aes(x = symbol, y = normalized_counts, color = TEST)) +
geom_boxplot() +
geom_point(position=position_dodge(w = 0.75)) +
scale_color_manual(limits = c("aerobic", "anaerobic"),
values = c("#003F5C","#BC5090")) +
scale_y_log10() +
xlab("Genes") +
ylab("log10 Normalized Counts") +
ggtitle("") +
theme_bw() +
theme(axis.text.x = element_text(angle = 45, hjust = 1)) +
theme(plot.title = element_text(hjust = 0.5))}
Plot the normalized count values for Cytokine Genes
To do this, we first need to list the gene names of interest, then extract normalized count values for those genes.
#list genes of interest
cytogenes <- c("ENSG00000125538",
"ENSG00000115008",
"ENSG00000136689",
"ENSG00000136244",
"ENSG00000110944",
"ENSG00000169429",
"ENSG00000169245",
"ENSG00000156234",
"ENSG00000189377",
"ENSG00000198223",
"ENSG00000232810",
"ENSG00000163235",
"ENSG00000105329",
"ENSG00000109320",
"ENSG00000090339",
"ENSG00000171855"
)
#normailized counts for those cytokine genes
cyto_atm_norm <- normalized_resATM %>%
filter (gene %in% cytogenes)
Now that we have the normalized counts for each of the genes for all samples, to plot using ggplot(), we need to gather the counts for all samples into a single column to allow us to give ggplot the one column with the values we want it to plot.
The gather() function in the tidyr package will perform this operation and will output the normalized counts for all genes for the first sample listed in the first 11 rows, followed by the normalized counts for second sample in the next 11 rows, so on and so forth.
# Gathering the columns to have normalized counts to a single column
gathered_cyto_atm <- cyto_atm_norm %>%
gather(colnames(cyto_atm_norm)[2:12], key = "SAMPLE_NAME", value = "normalized_counts")
## check the column header in the "gathered" data frame
head(gathered_cyto_atm)
Now, if we want our counts colored by sample group, then we need to combine the metadata information with the melted normalized counts data into the same data frame for input to ggplot():
gathered_cyto_atm <- inner_join(DE_meta, gathered_cyto_atm, by = c("SAMPLE_NAME" = "SAMPLE_NAME"))
The inner_join() will merge 2 data frames with respect to the “SAMPLE_NAME” column, i.e. a column with the same column name in both data frames.
Now that we have a data frame in a format that can be utilised by ggplot easily, let’s plot!
## plot using GroupGeneCount function
cyto_plot <- GroupGeneCountsBox(gathered_cyto_atm) + ggtitle("Cyotkine Genes")
cyto_plot
Plot the normalized count values for Mucin Genes
To do this, we first need to list the gene names of interest, then extract normalized count values for those genes.
#list genes of interest
mucingenes <- c("ENSG00000185499",
"ENSG00000145113",
"ENSG00000169894",
"ENSG00000173702",
"ENSG00000184956",
"ENSG00000215182",
"ENSG00000117983"
)
#normailized counts for those mucin genes
mucin_atm_norm <- normalized_resATM %>%
filter (gene %in% mucingenes)
Now that we have the normalized counts for each of the genes for all samples, to plot using ggplot(), we need to gather the counts for all samples into a single column to allow us to give ggplot the one column with the values we want it to plot.
The gather() function in the tidyr package will perform this operation and will output the normalized counts for all genes for the first sample listed in the first 11 rows, followed by the normalized counts for second sample in the next 11 rows, so on and so forth.
# Gathering the columns to have normalized counts to a single column
gathered_mucin_atm <- mucin_atm_norm %>%
gather(colnames(mucin_atm_norm)[2:12], key = "SAMPLE_NAME", value = "normalized_counts")
## check the column header in the "gathered" data frame
head(gathered_mucin_atm)
Now, if we want our counts colored by sample group, then we need to combine the metadata information with the melted normalized counts data into the same data frame for input to ggplot():
gathered_mucin_atm <- inner_join(DE_meta, gathered_mucin_atm, by = c("SAMPLE_NAME" = "SAMPLE_NAME"))
The inner_join() will merge 2 data frames with respect to the “SAMPLE_NAME” column, i.e. a column with the same column name in both data frames.
Now that we have a data frame in a format that can be utilised by ggplot easily, let’s plot!
## plot using GroupGeneCount function
mucin_plot <- GroupGeneCountsBox(gathered_mucin_atm) + ggtitle("Mucin Genes")
mucin_plot
Plot the normalized count values for HIF1A
To do this, we first need to list the gene names of interest, then extract normalized count values for those genes.
#list genes of interest
HIF1A <- "ENSG00000100644"
#normailized counts for those mucin genes
hif1a_atm_norm <- normalized_resATM %>%
filter (gene %in% HIF1A)
Now that we have the normalized counts for each of the genes for all samples, to plot using ggplot(), we need to gather the counts for all samples into a single column to allow us to give ggplot the one column with the values we want it to plot.
The gather() function in the tidyr package will perform this operation and will output the normalized counts for all genes for the first sample listed in the first 11 rows, followed by the normalized counts for second sample in the next 11 rows, so on and so forth.
# Gathering the columns to have normalized counts to a single column
gathered_hif1a_atm <- hif1a_atm_norm %>%
gather(colnames(hif1a_atm_norm)[2:12], key = "SAMPLE_NAME", value = "normalized_counts")
## check the column header in the "gathered" data frame
head(gathered_hif1a_atm)
Now, if we want our counts colored by sample group, then we need to combine the metadata information with the melted normalized counts data into the same data frame for input to ggplot():
gathered_hif1a_atm <- inner_join(DE_meta, gathered_hif1a_atm, by = c("SAMPLE_NAME" = "SAMPLE_NAME"))
The inner_join() will merge 2 data frames with respect to the “SAMPLE_NAME” column, i.e. a column with the same column name in both data frames.
Now that we have a data frame in a format that can be utilised by ggplot easily, let’s plot!
## plot using GroupGeneCount function
HIF1A_plot <- GroupGeneCountsBox(gathered_hif1a_atm) + ggtitle("HIF1A")
HIF1A_plot
ggsave(cyto_plot, filename = "../figures/anaVSaer/cytokine_norm.pdf", device = "pdf", height = 4, width = 6)
ggsave(mucin_plot, filename = "../figures/anaVSaer/mucin_norm.pdf", device = "pdf", height = 4, width = 6)
ggsave(HIF1A_plot, filename = "../figures/anaVSaer/HIF1A_norm.pdf", device = "pdf", height = 4, width =6)
#Volcano Plots
library(EnhancedVolcano)
volcano_atm_01 <- EnhancedVolcano(resATM_df,
lab = resATM_df$symbol,
x = 'log2FoldChange',
y= 'pvalue',
xlim = c(-5,5),
title = "Anaerobic vs Aerobic",
subtitle = "",
caption = "",
hlineCol = 'black',
vlineCol = 'black',
colAlpha = 4/5,
FCcutoff = 1,
pCutoff = 0.01,
pointSize = 1.5,
labCol = 'black',
cutoffLineCol = 'black',
border = "full",
col = c("grey30", "#EEC537", "#8AC1BE", "#D7462E"),
legendPosition = 'none'
)
volcano_atm_01
volcano_atm_05 <- EnhancedVolcano(resATM_df,
lab = resATM_df$symbol,
x = 'log2FoldChange',
y= 'pvalue',
xlim = c(-5,5),
title = "Anaerobic vs Aerobic",
subtitle = "",
caption = "",
hlineCol = 'black',
vlineCol = 'black',
colAlpha = 4/5,
FCcutoff = 1,
pCutoff = 0.05,
pointSize = 1.5,
labCol = 'black',
cutoffLineCol = 'black',
border = "full",
col = c("grey30", "#EEC537", "#8AC1BE", "#D7462E"),
legendPosition = 'none'
)
volcano_atm_05
volcano_atm_001 <- EnhancedVolcano(resATM_df,
lab = resATM_df$symbol,
x = 'log2FoldChange',
y= 'pvalue',
xlim = c(-5,5),
title = "Anaerobic vs Aerobic",
subtitle = "",
caption = "",
hlineCol = 'black',
vlineCol = 'black',
colAlpha = 4/5,
FCcutoff = 1,
pCutoff = 0.001,
pointSize = 1.5,
labCol = 'black',
cutoffLineCol = 'black',
border = "full",
col = c("grey30", "#EEC537", "#8AC1BE", "#D7462E"),
legendPosition = 'none'
)
volcano_atm_001
ggsave(volcano_atm_01, filename = "../figures/anaVSaer/volcano_atm_01.pdf", device = "pdf", height = 6, width = 6)
ggsave(volcano_atm_05, filename = "../figures/anaVSaer/volcano_atm_05.pdf", device = "pdf", height = 6, width = 6)
ggsave(volcano_atm_001, filename = "../figures/anaVSaer/volcano_atm_001.pdf", device = "pdf", height = 6, width = 6)
Now we want to color the dots in the volcano plot by certain gene groups, first we specify the gene groups
# Define gene groups
tightjunction <- c("TJP2", "TJP1", "TJP3", "TJAP1", "CDH1", "OCLN", "CGNL1", "CGN", "SYMPK", "CTNNB1", "SAFB")
oxstress <- c("GPX1", "GPX8", "SCD","OSGIN1", "OSER1", "OXSR1", "HIF1AN", "HIF1A", "HIF3A", "CTSB", "PRDX3", "NCF2", "NQO1", "NOXO1", "PARK7", "HMOX1", "HMOX2", "NFE2L2")
erstress <- c("ERN1", "EDEM2", "EDEM1", "EDEM3", "CALM1", "ATF6B", "ATF6", "ERN1", "EIF2AK3", "SREBF1", "CANX", "TRIB3", "DDIT3", "SERP1", "STIP1")
celldeath <- c("LDHA", "DAPK2", "DAPK3", "BAD", "PDCD6IP", "PDCD2", "PDCD11", "PDCD7", "PDCD2L", "PDCD6", "PDCD4", "FAS", "TRADD", "DAP", "DAD1", "DAP3", "CIDEB", "CIDEC", "CDIP1", "PIDD1", "DEDD2")
cytokine <- c( "IL6R", "TNFRSF11B", "TRAF6", "TNFRSF10A", "IL1R1", "IL17RB", "IL17RA", "IL18R1", "MAPK15", "MAP3K10", "MAP2K2", "MAP3K3", "MAPK7", "TNFAIP6", "TNFAIP3", "TNFAIP8", "TNFAIP1", "NFKBIA", "TLR1", "TLR9", "TLR4", "TLR5", "TLR2", "TLR3", "IL10RB", "IL10RA")
mucin <- c("MUC1", "MUC4", "MUC3A", "MUC13", "MUC6", "MUC5AC", "MUC5B")
genegroups <- do.call(c, list(tightjunction, oxstress, erstress, celldeath, cytokine))
# Load in res_dfATM
res_dfATMvolcanogenes <- resATM_df
# Remove rows with l2fc or padj of "NA"
res_dfATMvolcanogenes <- res_dfATMvolcanogenes[!is.na(res_dfATMvolcanogenes$log2FoldChange), ]
res_dfATMvolcanogenes <- res_dfATMvolcanogenes[!is.na(res_dfATMvolcanogenes$padj), ]
# Populate a new column with a "1" if gene symbol corresponds to one of the genegroups and a "0" if not. Sort in descending order. When creating the plot, this will allow the genegroup points to be brought to the front and easily seen.
res_dfATMvolcanogenes %>% mutate(volcanointeger = ifelse(symbol %in% genegroups, 1, 0)) -> res_dfATMvolcanogenes
res_dfATMvolcanogenes <- res_dfATMvolcanogenes[order(res_dfATMvolcanogenes$volcanointeger),]
# Populate a new column with the gene symbol if l2fc is <-1 or >1 AND padj is <10e-10 (SPRR3 also included because it's visually significant). This makes labeling easier in the volcano plot.
res_dfATMvolcanogenes %>% mutate(siggenes = ifelse(((log2FoldChange > 1 | log2FoldChange < -1) & padj < 10e-10) | symbol == "SPRR3" , symbol, "")) -> res_dfATMvolcanogenes
# create custom key-value pairs for different cell-types
# this can be achieved with nested ifelse statements
keyvals.colorATM <-
ifelse(res_dfATMvolcanogenes$symbol %in% tightjunction, "magenta",
ifelse(res_dfATMvolcanogenes$symbol %in% oxstress, "cyan2",
ifelse(res_dfATMvolcanogenes$symbol %in% erstress, "blue",
ifelse(res_dfATMvolcanogenes$symbol %in% celldeath, "gold",
ifelse(res_dfATMvolcanogenes$symbol %in% cytokine, "red",
"grey70")))))#)
keyvals.colorATM[is.na(keyvals.colorATM)] <- "grey69"
names(keyvals.colorATM)[keyvals.colorATM == "grey70"] <- 'Z rest'
names(keyvals.colorATM)[keyvals.colorATM == "magenta"] <- 'Tight Junctions'
names(keyvals.colorATM)[keyvals.colorATM == "cyan2"] <- 'Oxidative Stress'
names(keyvals.colorATM)[keyvals.colorATM == "blue"] <- 'ER Stress'
names(keyvals.colorATM)[keyvals.colorATM == "gold"] <- 'Cell Death'
names(keyvals.colorATM)[keyvals.colorATM == "red"] <- 'Cytokines'
volcano_ATM_group_05 <- EnhancedVolcano(res_dfATMvolcanogenes,
lab = res_dfATMvolcanogenes$siggenes,
labSize = 3.5,
boxedLabels = FALSE,
drawConnectors = TRUE,
widthConnectors = 0.05,
colConnectors = "grey30",
typeConnectors = "closed",
endsConnectors = "first",
lengthConnectors = unit(10e-5, 'npc'),
x = 'log2FoldChange',
y = 'padj',
xlim = c(-4,4),
title = NULL,
subtitle = "",
caption = "",
colCustom = keyvals.colorATM,
hlineCol = 'black',
vlineCol = 'black',
colAlpha = 0.5,
pointSize = 3,
FCcutoff = 1,
pCutoff = 0.05,
labCol = 'black',
cutoffLineCol = 'black',
border = "full",
legendPosition = "none"
)
volcano_ATM_group_05
volcano_ATM_group_01 <- EnhancedVolcano(res_dfATMvolcanogenes,
lab = res_dfATMvolcanogenes$siggenes,
labSize = 3.5,
boxedLabels = FALSE,
drawConnectors = TRUE,
widthConnectors = 0.05,
colConnectors = "grey30",
typeConnectors = "closed",
endsConnectors = "first",
lengthConnectors = unit(10e-5, 'npc'),
x = 'log2FoldChange',
y = 'padj',
xlim = c(-4,4),
title = NULL,
subtitle = "",
caption = "",
colCustom = keyvals.colorATM,
hlineCol = 'black',
vlineCol = 'black',
colAlpha = 0.5,
pointSize = 3,
FCcutoff = 1,
pCutoff = 0.01,
labCol = 'black',
cutoffLineCol = 'black',
border = "full",
legendPosition = "none"
)
volcano_ATM_group_01
volcano_ATM_group_001 <- EnhancedVolcano(res_dfATMvolcanogenes,
lab = res_dfATMvolcanogenes$siggenes,
labSize = 3.5,
boxedLabels = FALSE,
drawConnectors = TRUE,
widthConnectors = 0.05,
colConnectors = "grey30",
typeConnectors = "closed",
endsConnectors = "first",
lengthConnectors = unit(10e-5, 'npc'),
x = 'log2FoldChange',
y = 'padj',
xlim = c(-4,4),
title = NULL,
subtitle = "",
caption = "",
colCustom = keyvals.colorATM,
hlineCol = 'black',
vlineCol = 'black',
colAlpha = 0.5,
pointSize = 3,
FCcutoff = 1,
pCutoff = 0.001,
labCol = 'black',
cutoffLineCol = 'black',
border = "full",
legendPosition = "none"
)
volcano_ATM_group_001
ggsave(volcano_ATM_group_01, filename = "../figures/anaVSaer/volcano_atm_group_01.pdf", device = "pdf", height = 6, width = 6)
ggsave(volcano_ATM_group_05, filename = "../figures/anaVSaer/volcano_atm_group_05.pdf", device = "pdf", height = 6, width = 6)
ggsave(volcano_ATM_group_001, filename = "../figures/anaVSaer/volcano_atm_group_001.pdf", device = "pdf", height = 6, width = 6)
sessionInfo()
R version 4.0.2 (2020-06-22)
Platform: x86_64-apple-darwin17.0 (64-bit)
Running under: macOS Catalina 10.15.6
Matrix products: default
BLAS: /System/Library/Frameworks/Accelerate.framework/Versions/A/Frameworks/vecLib.framework/Versions/A/libBLAS.dylib
LAPACK: /Library/Frameworks/R.framework/Versions/4.0/Resources/lib/libRlapack.dylib
locale:
[1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8
attached base packages:
[1] parallel stats4 stats graphics grDevices utils datasets methods base
other attached packages:
[1] EnhancedVolcano_1.6.0 annotables_0.1.91 AnnotationHub_2.20.2 BiocFileCache_1.12.1
[5] dbplyr_1.4.4 cowplot_1.0.0 wesanderson_0.3.6 ggrepel_0.8.2
[9] ggpubr_0.4.0 DEGreport_1.24.1 pheatmap_1.0.12 DESeq2_1.28.1
[13] SummarizedExperiment_1.18.2 DelayedArray_0.14.1 matrixStats_0.56.0 Biobase_2.48.0
[17] GenomicRanges_1.40.0 GenomeInfoDb_1.24.2 IRanges_2.22.2 S4Vectors_0.26.1
[21] BiocGenerics_0.34.0 RColorBrewer_1.1-2 forcats_0.5.0 stringr_1.4.0
[25] dplyr_1.0.2 purrr_0.3.4 readr_1.3.1 tidyr_1.1.2
[29] tibble_3.0.3 ggplot2_3.3.2 tidyverse_1.3.0
loaded via a namespace (and not attached):
[1] readxl_1.3.1 backports_1.1.9 circlize_0.4.10 plyr_1.8.6
[5] ConsensusClusterPlus_1.52.0 splines_4.0.2 BiocParallel_1.22.0 digest_0.6.25
[9] invgamma_1.1 htmltools_0.5.0 SQUAREM_2020.5 fansi_0.4.1
[13] magrittr_1.5 memoise_1.1.0 cluster_2.1.0 openxlsx_4.1.5
[17] limma_3.44.3 ComplexHeatmap_2.4.3 annotate_1.66.0 Nozzle.R1_1.1-1
[21] modelr_0.1.8 colorspace_1.4-1 blob_1.2.1 rvest_0.3.6
[25] rappdirs_0.3.1 haven_2.3.1 xfun_0.16 crayon_1.3.4
[29] RCurl_1.98-1.2 jsonlite_1.7.0 genefilter_1.70.0 survival_3.1-12
[33] glue_1.4.2 gtable_0.3.0 zlibbioc_1.34.0 XVector_0.28.0
[37] GetoptLong_1.0.4 car_3.0-9 shape_1.4.5 abind_1.4-5
[41] scales_1.1.1 DBI_1.1.0 edgeR_3.30.3 rstatix_0.6.0
[45] Rcpp_1.0.5 xtable_1.8-4 lasso2_1.2-21.1 tmvnsim_1.0-2
[49] clue_0.3-57 foreign_0.8-80 bit_4.0.4 truncnorm_1.0-8
[53] httr_1.4.2 ellipsis_0.3.1 pkgconfig_2.0.3 reshape_0.8.8
[57] XML_3.99-0.5 farver_2.0.3 locfit_1.5-9.4 later_1.1.0.1
[61] tidyselect_1.1.0 labeling_0.3 rlang_0.4.7 AnnotationDbi_1.50.3
[65] munsell_0.5.0 BiocVersion_3.11.1 cellranger_1.1.0 tools_4.0.2
[69] cli_2.0.2 generics_0.0.2 RSQLite_2.2.0 broom_0.7.0
[73] evaluate_0.14 fastmap_1.0.1 ggdendro_0.1.22 yaml_2.2.1
[77] knitr_1.29 bit64_4.0.4 fs_1.5.0 zip_2.1.1
[81] nlme_3.1-148 mime_0.9 xml2_1.3.2 compiler_4.0.2
[85] rstudioapi_0.11 curl_4.3 png_0.1-7 interactiveDisplayBase_1.26.3
[89] ggsignif_0.6.0 reprex_0.3.0 geneplotter_1.66.0 stringi_1.4.6
[93] lattice_0.20-41 Matrix_1.2-18 psych_2.0.9 vctrs_0.3.4
[97] pillar_1.4.6 lifecycle_0.2.0 BiocManager_1.30.10 GlobalOptions_0.1.2
[101] irlba_2.3.3 data.table_1.13.0 bitops_1.0-6 httpuv_1.5.4
[105] R6_2.4.1 promises_1.1.1 rio_0.5.16 MASS_7.3-51.6
[109] assertthat_0.2.1 rjson_0.2.20 withr_2.2.0 mnormt_2.0.2
[113] GenomeInfoDbData_1.2.3 hms_0.5.3 grid_4.0.2 rmarkdown_2.3
[117] ashr_2.2-47 carData_3.0-4 logging_0.10-108 mixsqp_0.3-43
[121] base64enc_0.1-3 shiny_1.5.0 lubridate_1.7.9